Fusion Methods for ICD10 Code Classification of Death Certificates in Multilingual Corpora

نویسندگان

Mike Ebersbach

Robert Herms

Maximilian Eibl

چکیده

In this working notes paper, we present our methodology and the results for Task 1 of the CLEF eHealth Evaluation Lab 2017. This benchmark addresses information extraction in written text with focus on unexplored languages corpora, specifically English and French. The goal is to automatically assign codes (ICD10) to text content of death certificates. Our approach is focused on fusion methods in conjunction with support vector machines for ICD10 code classification. First, we composed a large scale feature set comprising more than 40k features based on bag of words, bag of 2-grams, bag of 3-grams, latent Dirichlet allocation, and the ontologies of WordNet and UMLS. In the development phase, we evaluated three different methods: each feature type separately (no fusion), early feature-level fusion, and late fusion including the rules majority vote, maximum, and average. For the English test set, the best F-measure was 0.8187 using early fusion. For the two French test sets, we achieved 0.6692 and 0.7216 using late fusion in connection with the rule average for bag of words and bag of 2-grams.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SIBM at CLEF eHealth Evaluation Lab 2017: Multilingual Information Extraction with CIM-IND

This paper presents SIBM’s participation in the Task 1: Multilingual Information Extraction ICD10 coding of the CLEF eHealth 2017 evaluation initiative which focuses on named entity recognition in French and English death certificates. We addressed the identification of relevant clinical entities within the International Classification of Diseases version 10 (ICD10) in the CépiDC and CDC datase...

متن کامل

CLEF eHealth 2017 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in English and French

This paper reports on Task 1 of the 2017 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with coding of death certificates, as introduced in CLEF eHealth 2016. This largescale classification task consisted of extracting causes of death as coded in the International Classification of Diseases, tenth re...

متن کامل

A Reproducible Approach with R Markdown to Automatic Classification of Medical Certificates in French

English. In this paper, we report the ongoing developments of our first participation to the Cross-Language Evaluation Forum (CLEF) eHealth Task 1: “Multilingual Information Extraction ICD10 coding” (Névéol et al., 2017). The task consists in labelling death certificates, in French with international standard codes. In particular, we wanted to accomplish the goal of the ‘Replication track’ of t...

متن کامل

ECSTRA-INSERM @ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates

This paper describes the participation of ECSTRA-INSERM team at CLEF eHealth 2016, task 2.C. The task involves extracting ICD10 codes from death certificates, mainly described with short plain texts. We cast the task as a machine learning problem involving the prediction of the ICD10 codes (categorical variable) from the raw text transformed into a bag-of-words matrix. We rely on probabilistic ...

متن کامل

بررسی علل فوت متوفیان شهرستان کاشان:81-1377

Introduction: Generally, epidemiologic review began with death data. Long time is seen changes in causes of death. These changes indicate increasing causes of death from infective diseases to chronic diseases. Thus, death indicators are good instruments for determining of community health. And with it can suggest constructive and positive recommends reducing causes of death. Methods: This descr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Fusion Methods for ICD10 Code Classification of Death Certificates in Multilingual Corpora

نویسندگان

چکیده

منابع مشابه

SIBM at CLEF eHealth Evaluation Lab 2017: Multilingual Information Extraction with CIM-IND

CLEF eHealth 2017 Multilingual Information Extraction task Overview: ICD10 Coding of Death Certificates in English and French

A Reproducible Approach with R Markdown to Automatic Classification of Medical Certificates in French

ECSTRA-INSERM @ CLEF eHealth2016-task 2: ICD10 Code Extraction from Death Certificates

بررسی علل فوت متوفیان شهرستان کاشان:81-1377

عنوان ژورنال:

اشتراک گذاری